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Abstract 

Nematode chemosensory GPCRs in Caenorhabditis elegans (NemChRs) are classified into 19 gene families, and are initially 
thought to have split from the ancestral Rhodopsin family of GPCRs. However, earlier studies have shown that among all 19 
NemChR gene families, only the srw family has a clear sequence relationship to the ancestral Rhodopsin GPCR family. Yet, 
the phylogenetic relationships between the srw family of NemChRs and the Rhodopsin subfamilies are not fully understood. 
Also, a widespread search was not previously performed to check for the presence of putative srw family-like sequences or 
the other 18 NemChR families in several new protostome species outside the nematode lineage. In this study, we have 
investigated for the presence of 19 NemChR families across 26 eukaryotic species, covering basal eukaryotic branches and 
provide the first evidence that the srw family of NemChRs is indeed present across several phyla of protostomes. We could 
identify 29 putative orthologs of the srw family in insects (15 genes), molluscs (11 genes) and Schistosoma mansoni (3 
genes). Furthermore, using HMM-HMM profile based comparisons and phylogenetic analysis we show that among all 
Rhodopsin subfamilies, the peptide and SOG (somatostatin/opioid/galanin) subfamilies are phylogenetically the closest 
relatives to the srw family of NemChRs. Taken together, we demonstrate that the srw family split from the large Rhodopsin 
family, possibly from the peptide and/or SOG subfamilies, well before the split of the nematode lineage, somewhere close 
to the divergence of the common ancestor of protostomes. Our analysis also suggests that the srsx family of NemChRs 
shares a clear sequence homology with the Rhodopsin subfamilies, as well as with few of the vertebrate olfactory receptors. 
Overall, this study provides further insights into the evolutionary events that shaped the GPCR chemosensory system in 
protostome species. 
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Introduction 

All animals recognize and respond to chemosensory information 
in their environment. In most multicellular animals, the ability to 
sense the environment relies largely on the membrane bound 
chemosensory receptors, which detect environmental chemical 
stimuli and convert it into intracellular responses [1,2]. Also, in 
most eukaryotes, the chemosensory receptors belong to the 
superfamily of G protein-coupled receptors (GPCRs), which are 
crucial for many physiological processes and constitute the 
dominant signaling system in metazoans [2,3,4,5]. Chemosensory 
GPCR families include vertebrate olfactory receptors (ORs) [6], 
trace amine-associated receptors (TAAR) [7] , vomeronasal recep- 
tors type 1 and 2 (VR 1 & 2) [8,9,10], taste receptors type 1 and 2 
(TR 1 & 2) [11,12], and a large group of nematode chemosensory 
receptors (NemChRs) [13]. 

Chemosensory GPCRs in the nematode worm Caenorhabditis 
elegans (NemChRs) are classified into 19 gene families based on 
sequence similarities and monophyletic clustering of genes [14]. 
Similarly, 15 of these 19 gene families were grouped into three 
major superfamilies named Sra, Str and Srg, which contain four 



(sra, srab, srb and sre), five (srd, srh, sri, srj and str), and six (srg, 
srt, sru, srv, srx and srxa) families, respectively [14]. The other four 
families, srbc, srsx, srw and srz were instead classified as "others" 
based on sharing low sequence similarity with all other NemChR 
families [14]. Similar to NemChR families, ORs have undergone 
expansions in many mammalian species and are by far the largest 
mammalian gene family [2,15]. Although, ORs appear to have 
undergone large expansions, only a small fraction of them were 
found in the deuterostome invertebrates [16,17,18]. On the other 
hand, the pheromone receptors VR type 1 and 2 and TR type 1 
(sweet) and 2 (bitter) are found to be specific to vertebrates 
[19,20,21]. Considering these observations, it is apparent that the 
chemosensory GPCRs appears to have evolved multiple times 
independently, as chemosensory GPCRs in deuterostomes (verte- 
brate OR, TAAR, VR 1 & 2 and TR 1 & 2) are not closely related 
to those found in protostomes (NemChRs). 

Interestingly, all chemosensory GPCR gene families are 
suggested to have a common origin despite their low sequence 
similarity and the diverse nature of their signaling molecules. 
Moreover, earlier studies have suggested that the chemosensory 
GPCRs might have evolved from the ancient Rhodopsin family of 
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GPCRs [22]. Likewise, whole genome studies as well as previous 
GPCR mining studies in the basal eumetazoan lineages suggested 
that the diversifications of the large Rkodopsin family into 
subfamilies, like amines and peptides, as well as the olfactory 
receptors have occurred before the protostome and deuterostome 
split [22,23,24,25]. To support this further, a recent study also 
showed that the cnidarian Nematostella vectensis has 35 full length 
chordate like OR genes [26]. This suggests that the common 
ancestor of the cnidarians and bilateral animals had chordate like 
OR genes that expanded greatly in deuterostomes. However, these 
chordate like OR genes were subsequently lost in all protostomes 
that evolved a differential chemosensory system, which includes 
NemChRs. 

Although earlier studies support the fact that the chemosensory 
GPCRs split from the large Rhodopsin family, little is known about 
the relationships between the NemChR families and the Rhodopsin 
like GPCRs. Intriguingly, among all 19 NemChR gene families, 
only the srw family have been identified to have a clear sequence 
relationship with the subfamilies of the Rhodopsin (7tm_l) 
superfamily [14,22]. Yet, the phylogenetic relationships between 
the srw family of NemChRs and the Rhodopsin subfamilies are not 
fully understood. Furthermore, the presence of putative homologs 
of these 19 NemChR families in species other than nematodes is 
not thoroughly examined. In the current study, we have 
investigated the presence of NemChRs in 26 genomes that 
comprise all eukaryotic supergroups. We demonstrate that the srw 
family of NemChRs is found across several protostome phyla and 
it split from the large Rhodopsin family, possibly from the peptide 
and the SOG subfamilies, well before the split of the nematode 
lineage, somewhere close to the divergence of the common 
ancestor of protostomes. 

Materials and Methods 

Proteome dataset 

Proteomes were downloaded from Ensembl Metazoa (http:// 
metazoa.ensembl.org) for Anopheles gambiae, Acyrthosiphon 
pisum, Apis mellifera, Pediculus humanus, Daphnia pulex, 
Pristionchus pacificus and Schistosoma mansoni; Oryza sativa 
and Arabidopsis thaliana proteomes were downloaded from 
Ensembl Plants (http://plants.ensembl.org); Trypanosoma brucei 
and Tetrahymena thermophila proteomes were downloaded from 
Ensembl Protists (http://protists.ensembl.org); Homo sapiens, 
Mus musculus, Gallus gallus, Xenopus tropicalis, Danio rerio 
and Petromyzon marinus proteomes were downloaded from 
Ensembl (http:/ /www.ensembl.org); the C. elegans proteome 
was downloaded from WormBase (http://www.wormbase.org); 
N. vectensis, Trichoplax adhaerens, Phytophtera sojae, Thalassio- 
sira pseudonana, Lottia gigantea and Monosiga brevicollis 
proteomes were downloaded from the Joint Genome Institute 
(http://genome.jgi.doe.gov/); Drosophila melanogaster and Dro- 
sophila willistoni proteomes were downloaded from FlyBase 
(http://flybase.org); Dictyostelium discoideum and Dictyostelium 
fasciculatum proteomes were downloaded from dictyBase (www. 
dictybase.org); the Entamoeba histolytica proteome was down- 
loaded from amoebaDB (http://amoebadb.org); the Paramecium 
tetraurelia proteome was downloaded from NCBI (http:/ /www. 
ncbi.nlm.nih.gov/); the Trichomonas vaginalis proteome was 
downloaded from TrichDB (http://trichdb.org), and the Giardia 
lamblia proteome was downloaded from GiardiaDB (http:// 
giardiadb.org). Furthermore, the fungal proteome dataset was 
downloaded from UniProt database (http://www.uniprot.org/). 



Identification of nematode chemosensory GPCRs 
(NemChRs) 

The proteomes analyzed in this study were searched against a 
local installation of the Pfam database version 26, which has 13672 
families, with sensitive HMM models built using HMMER3 
software [27]. We utilized the Pfam_scan.pl script, obtained from 
the Pfam ftp-site, to align each of the 13672 HMM profiles with 
our proteome dataset. The script Pfam_scan.pl uses homology 
criterion set by the Pfam database, which is based on a manually 
curated gathering threshold for each model. The gathering 
threshold of a Pfam model makes sure that any sequence must 
attain a score greater than or equal to the threshold to be deemed 
significant and included in a Pfam-A full alignment. This ensures 
that there are no false positive sequences in the Pfam-A alignments 
and thereby increases the accuracy of the Pfam-A models. About 
79.9% of the proteins contained in the SWISS-PROT and 
TrEMBL databases have at least one match to a Pfam-A family. 
Utilizing these accurate HMM models, we retrieved all the 
sequences that were assigned to the following Pfam domains 
(HMM profiles) in the Pfam-search. We searched for the following 
19 NemChR family domains, 7TM_GPCR_Sra (PF02117), 
7TM_GPCR_Srab (PF10292), 7TM_GPCR_Srb (PF02175), sre 
(PF03125), srg (PF02118), 7TM_GPCR_Srt (PF10321), 
7TM_GPCR_Sru (PF10322), 7TM_GPCR_Srv (PF10323), 
7TM_GPCR_Srx (PF10328), srxa (Serpentine_r_xa; PF03383), 
7TM_GPCR_Srd (PF10317), 7TM_GPCR_Srh (PF10318), 
7TM_GPCR_Sri (PF10327), 7TM_GPCR_Srj (PF10319), 
7TM_GPCR_Str (PF10326), 7TM_GPCR_Srbc (PF10316), 
7TM_GPCR_Srsx (PF 10320), 7TM_GPCR_Srw (PF 10324) and 
7TM_GPCR_Srz (PF10325). Also, we included 7tm_4 (PF13853), 
a new Pfam domain exclusive to the vertebrate like olfactory 
receptors included in the latest Pfam release 26. The separation of 
the olfactory receptors from the conventional 7tm_l (Rhodopsin) 
domain facilitates a direct identification of novel members of the 
olfactory receptor family. 

Identification and categorization of the Rhodopsin 
(7tm_1) family receptors 

Using the same procedure mentioned above, we searched in the 
proteomes of N. vectensis, T. adhaerens, D. melanogaster and C. elegans 
for the sequences containing 7tm_l (PF00001) as their HMM 
profile with highest scoring alignment. In order to assign subfamily 
level classification for the identified Rhodopsin family receptors, we 
performed a standalone BLASTP search against the human 
GPCRs. We utilized standard default settings for the BLASTP 
searches, with a word size of 3 and BLOSUM62 scoring matrices. 
We downloaded the Rhodopsin family GPCRs from our human 
GPCR repertoire [28] and tagged them with their subfamily 
categorization. Afterwards, the Rhodopsin family receptors from all 
four species (N. vectensis, T. adhaerens, D. melanogaster and C. elegans) 
were searched against a database consisting of these tagged human 
Rhodopsin family of GPCRs. To categorize the sequences into 
subfamilies, the classification criteria were that they must have at 
least four of the five best hits from the same subfamily in the 
BLASTP search. Thereafter, each family in four species was 
separately aligned using the MAFFT program, with option E-INS- 
I and BLOSUM62 as the scoring matrix. Each alignment was 
thereafter examined and refined in Jalview 2.5.1, i.e. the sequences 
were trimmed and well conserved and aligned regions were kept. 
From the alignments, Hidden Markov Models (HMMs) were 
constructed using the HMMER3 package. The models were 
constructed using the HMMbuild program with default settings. 
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HMM-HMM profile comparisons 

To compare two HMM profiles, we utilized the HHsearch 
program with default options as implemented in the HH-suite 
software package. HHsearch is considered to be one of the most 
sensitive methods for protein homology detection [29]. It is widely 
used by the Pfam database to determine the relationships between 
families and to assign a Pfam clan where the homologous families 
are grouped together [27,30]. The HHsearch program describes a 
probability score for homologous relationship, which is considered 
as an appropriate measure to decide whether a hit is a true 
homolog to the query and can be considered a more intuitive 
measurement than the commonly used E-value for evaluating the 
significance of the search result. According to the HHsearch/HH- 
suite user guide (ftp://toolkit.genzentrum.lmu.de/pub/HH-suite/ 
hhsuite-userguide.pdf) a probability of >95% is considered a 
homology that is nearly certain [29]. 

Consensus sequences 

Consensus sequences of each gene family used in this study were 
generated from their corresponding HMM profiles. Each HMM 
profile that corresponds to a particular family serves as an input for 
the HMMEMIT program and thus a consensus sequence was 
obtained using option '-C as implemented in the HMMER3 
package. The consensus sequence is formed using a plurality rule 
that selects the maximum probability residue at each match state 
from the HMM profiles. 

Multiple sequence alignment and phylogenetic tree 
construction 

Multiple sequence alignments analyzed in this study were 
generated using MAFFT version 6 (http://mafft.cbrc.jp/ 
alignment/ server/), with BLOSUM62 as the scoring matrix and 
using option E-INS-I (recommended for sequences with conserved 
motifs and carrying multiple domains) [31,32]. Thereafter 
alignments were manually inspected and trimmed to 7TM regions 
using Jalview software [33]. The phylogenetic analysis was 
performed using the Bayesian approach implemented in MrBayes 
version 3.2 [34]. Markov Chain Monte Carlo (MCMC) analysis 
was used to estimate the posterior probabilities and branch lengths 
of the trees. To determine the best amino acid substitution model, 
a mixed option (aamodelpr = mixed) was used. Gamma shaped 
model was used to estimate the variation of evolutionary rates 
across sites (lset rates = gamma). All Bayesian analyses conducted 
in this study included two independent MCMC runs, where each 
MCMC run uses 4 parallel chains composed of three heated and 
one cold chain. Each Markov chain was started from a random 
tree and was set to run for 3,000,000 generations and every 
hundredth tree was sampled. To test the convergence of the two 
entirely independent runs starting from different random trees, 
diagnostic frequency (diagnfreq) generations were performed and 
diagnostics were calculated for every 1000 generations (diagnfreq 
= 1000). To determine when to terminate the MCMC genera- 
tions, a stop rule was applied (standard deviation of split 
frequencies <0.01). In order to ensure that the parameter 
estimates were only made from data drawn from distributions 
derived after the MCMCs had converged, we discard the first 25% 
of the sampled trees using the "relburnin" setting (relburnin = yes 
and burninfrac = 0.25). After discarding the "burn-in" samples, 
MCMC runs were summarized and further investigated for 
convergence of all parameters, using sump and sumt commands in 
MrBayes software. Thereafter a consensus tree was built from the 
remaining 75% of the sampled trees with the MrBayes sumt 
command using the 50% majority rule method. The sump 



command was used to assure that an adequate sample of the 
posterior probability distribution was reached during the MCMC 
procedure. The phylogenetic tree was drawn in FigTree 1.3.1 
(http://tree.bio.ed.ac.uk/software/figtree/). The topology of all 
the Bayesian phylogenetic trees supported by the posterior 
probability (PP) was cross verified with bootstrap analysis (500 
replicates) using the maximum likelihood (ML) approach imple- 
mented in the PhyML (version 3.0) program [35]. Bootstrap values 
were indicated as percentage for the nodes that received good 
support in ML approach. 

Results 

Search for novel members of the NemChR families across 
several eukaryotes 

We performed Pfam HMM profile searches and manual 
inspection of conserved amino acid motifs to check for the 
presence of NemChR families sequences across 26 species from 
several different phyla of eukaryotes. Interestingly, among 19 
NemChR families, we identified putative srw (PF 10324) family- 
like sequences across several phyla of protostomes. However, we 
failed to identify full length srw family-like sequences in eukaryotes 
other than the species analyzed from the superphylum of 
protostomes. Overall, we identified 35 novel genes encoding the 
7tm_GPCR_srw (PF 10324) domain in the analyzed genomes of 
insects (16 genes from 7 species), molluscs (16 genes from L. 
gigantea) and S. mansoni (3 genes) (see Figure 1, Table SI and 
Dataset SI). Furthermore, we examined a total of 90 chemosen- 
sory like genes identified in the mollusc Aplysia caltfomica [36] and 
found that 29 of these sequences had 7tm_GPCR_srw as their 
Pfam domain with highest scoring alignment (Figure 1, Table SI 
and Dataset SI). In addition, our Pfam search identified fragments 
in the cnidarian jV. vectensis (Nv_ 210893) and amoebozoa D. 
fasciculatum (Df_F4PUK6) that had the highest alignment score 
against the 7tm_GPCR_srw domain. Also, our analysis found 
sequence fragments in jV. vectensis (Nv_205247), as well as in T. 
adhaerens (Ta_58780), which had 7TM_GPCR_Srsx (PF10320) 
domain as their significant Pfam match. All putative NemChR 
family members identified using initial Pfam search was subse- 
quently examined whether they are true positives hits using 
phylogenetic analysis (described in the next sections). 

Phylogenetic verification of srw family-like sequences 
identified in the Pfam search 

In order to verify the list of putative srw family like sequences 
(35) identified in the Pfam search (mentioned above), we combined 
those 35 sequences on the list with a dataset consisting of known 
and annotated srw family sequences from the nematodes, C. elegans 
and P. pacificus (obtained from database searches and earlier 
studies [14]). Phylogenetic analysis on the dataset demonstrated 
that 29 of 35 novel srw like sequences identified in the genomes of 
insects, L. gigantea and S. mansoni clustered with the annotated srw 
family sequences (94% posterior probability (PP), 51% of ML 
bootstrap support, see Figure 2) from the nematodes C. elegans and 
P. pacificus (also see Figure 3). Furthermore, these sequences were 
clearly separated from the Rhodopsin family members (see Figure 2). 
This topology is consistent in our analysis to identify the most 
closely related Rhodopsin subfamily to the srw family (see Figure 4 
and Figure SI). Furthermore, the list of novel srw members has 
several residues conserved with the srw sequences from C. elegans, 
when included in the multiple sequence alignments (Figure 5 and 
Figure S2). In contrast, the chemosensory receptor like genes in 
the mollusc A. californica (which had 7tm_GPCR_srw domain as 
their as their highest scoring alignment), fall into a cluster separate 
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Deuterostomes 



Bilateria 



Insecta 
Arthropoda 



Mollusca 



Cnidaria 



Placozoa 



Choanoflaqellata 



Amoebozoa 



Plants 



Chromalveolata 



Anopheles gambiae (mosquito) 
Acyrthosiphon pisum (pea aphid) 
Apia mellifera (western honey bee) 
Drosophila melanogaster (fruit fly) 
Drosophila wiilistoni (fruit tly) 
Pedicuius humanus (human louse) 
Daphnia pulex (water flea) 



Pristionchus pacificus (roundworm) 



Lottia gigantea (giant owl limpet) 
Aplysia californica (California sea hare) 



- - - - Schistosoma mansoni (blood fluke) 

- - - - Nem 0(0 stella vectensis (starlet sea anemone) 

- - - - Thchoplax adhaerens (a free living placozoan) 
.... Monosiga brevicollis (choanoflagellate) 

- - - - Fungi 

CDictyostelium discoidcum (slime mold) 
Dictyostelium fasciculatum (cellular slime mold) 

CArabidopsis thaliana (mouse-ear cress) 
Oryza sativa (rice) 



Paramecium tetraurelia (unicellular ciliate) 
Phytophtera sojae (soybean pathogen) 
Tetrahymena thormophila (unicellular ciliate) 
Thalassiosira pseudonana (marine diatom) 



srtae srsx $rz srw sra srab srb sre srg srt sru srv srx srxa srd srh sri srj str 
2(2) 
3(3) 
2(2) 
1(1) 
1(1) 
4(3) 
3(3) 



89 40 75 152 
2 52 13 



16(11) 
29 (0) 



Trichomonas vaginalis (flagellated protozoan) 
Trypanosoma brucei (extracellular eukaryotic parasite) 
Giardia lamblia (flagellated protozoan parasite) 



3(3) 



65 30 13 60 
9 25 55 



67 67 41 32 107 16 
49 66 54 61 



73 244 63 42 241 
131 43 37 9 215 



Figure 1 . Schematic diagram shows the nematode chemosensory gene families (NemChRs) across the analyzed taxa. Numbers within 
the parenthesis in the srw column represent the actual number of the srw like sequences that clustered with the annotated srw family members from 
nematodes (C. elegans and P. pacificus). 
doi:1 0.1 371 /journal.pone.0093048.g001 



from the srw family representatives from C, elegans (Figure 2). In 
addition, the putative fragments in jV. vectensis and D. fasciculatum 
(which contains the 7tm_GPCR_srw domain as their best hit) 
clustered separately from the srw cluster. This suggests that they 
are divergent or false positives from the Pfam search (Figure 2). 
Taken together, we consider only those 29 sequences from 
protostomes that clustered with the known srw family sequences in 
C. elegans and P. pacificus to be putative orthologs of the srw family 
(Figure 2). However, we describe these 29 sequences from 
protostomes as putative orthologs merely on the basis of the 
phylogenetic clustering and not based on functions. Because the 
chemosensory GPCRs in C. elegans are classified largely based on 
the sequence analysis and indeed none of the srw genes were 
experimentally verified to function as chemosensory receptors 
[13,14,37]. 

Phylogenetic clarification of putative srsx family like 
sequences identified in the Pfam search 

Besides identifying novel srw family members from protostomes, 
our analysis also found sequence fragments in N. vectensis 
(Nv_205247), as well as in T. adhaerens (Ta_58780), which had 
7TM_GPCR_Srsx (PF10320) domain as their significant Pfam 
match with E-values (E) 8.3e — 05 and 1.5e — 08, respectively. 
However, Pfam search showed that the fragment identified in T. 
adhaerens (Ta_58780) can also be readily aligned with the 7tm_l 
(PF00001) domain (residues 85 to 281, E = 2.9e~23), as well as the 
srsx domain (residues 40 to 99, E= 1.5e— 08) (see Table S2). 



Similarly, our Pfam search also identified a few vertebrate 
olfactory receptors that had two significant Pfam HMM profile 
hits corresponding to srsx domain (PF 10320) and 7tm_4 domain 
(PF13853), within their transmembrane spanning regions. Such 
sequences, which had Pfam family overlaps within the same 
sequence regions were found in human (13 genes), mouse (43 
genes), chicken (1 gene) and frog (1 gene) (Table S2). Although 
these sequences had two significant Pfam HMM profile hits within 
their transmembrane regions, in each and every instance the 
7tm_4 domain had better E-values when compared to the srsx 
domain (Table S2). However, according to the Pfam documen- 
tation [27], if the same region of a sequence matches two Pfam 
families, it should be considered a false positive in one of them, but 
if the domain hits are from the same Pfam clan then the overlap is 
believed to reflect an evolutionary relationship between two Pfam 
families. 

In order to clarify these Pfam predictions and the relation- 
ships between the srsx family and vertebrate olfactory 
receptors (7tm_4), we performed phylogenetic analysis on a 
comprehensive dataset. The dataset included 1) srsx family 
sequences from C. elegans; 2) vertebrate olfactory sequences that 
had srsx domain (PF 10320) as a significant hit; 3) functional 
olfactory family sequences from humans that encodes the 
7tm_4 (olfactory) domain; 4) vertebrate like olfactory genes 
identified in M. vectensis [26]; 5) srsx like fragments identified in 
T. adhaerens (Ta_58780) and N. vectensis (Nv_205247); 6) all 
novel srw like sequences and 7) consensus sequences for other 
NEMCHR families. The overall unrooted tree shows a 
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\ / 

Figure 2. Phylogenetic relationships of the Rhodopsin subfamilies, the srw family from C. elegans and the srw like sequences 
identified from several protostome species. The tree topology was inferred from Bayesian analysis with a gamma correction using MrBayes 
software. The Rhodopsin subfamilies from C. elegans included the peptide (violet), amine (light blue) and the SOG (olive green) subfamilies, among 
others. The major clade that clustered the novel srw like sequences with the known srw family members from nematodes (C. elegans and P. pacificus) 
is highlighted in red (the conserved 7TM regions of the sequences from srw clade are shown in Figure S2). Previously known srw family members 
from nematodes (C. elegans and P. pacificus) are marked with a star symbol. Sequences that had the highest alignment score against the 
7tm_GPCR_srw domain, but failed to cluster within the srw clade are marked with black edges. Similarly, the chemosensory genes in A. californica 
that had the highest alignment score against the 7tm_GPCR_srw domain in our Pfam search, but clustered separately from the srw family members 
from nematode is highlighted in grey. We followed the same renaming for the A. californica genes as previously used [36] for easy cross verification. 
Abbreviations used in the figure includes, Ap (Acyrthosiphon pisum), Ag (Anopheles gambiae), Am (Apis mellifera), Ce (Caenorhabditis elegans), Df 
(Dictyostelium fasciculatum), Dm (Drosophila melanogaster), Dp (Daphnia pulex), Dw (Drosophila willistoni), Lg (Lottia gigantea), Ph (Pediculus 
humanus), Pp (Pristionchus pacificus), Sm (Schistosoma mansoni) and Nv (Nematostella vectensis). Posterior probabilities and bootstrap replicates 
(within parenthesis) are shown as a percentage for the major nodes. 
doi:1 0.1 371 /journal.pone.0093048.g002 
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Vertebrate 
olfactory 
receptors 
(7tm_4) 



Olfactory like receptors 
in B.floridae 




Representatives of 
NemChR families 



CeJ3670 
re 296« r 



Figure 3. Phylogenetic relationships between the srsx, srw and the olfactory like receptors. The phylogenetic tree includes 1) 
representatives of srsx family in C. elegans, 2) previously known srw family members from nematodes (C. elegans and P. pacificus), 3) srw like 
sequences identified in this study, 4) vertebrate olfactory genes (7tm_4), 5) olfactory like receptors in S. floridae, 6) olfactory like receptors in N. 
vectensis [26], 7) vertebrate olfactory sequences that had significant alignment score against the 7tm_GPCR_srsx domain (see Table S2), 8) consensus 
representative for each NemChR families, and 9) the sequence fragments from N. vectensis (Nv_205247) and 7". adhaerens (Ta_58780) that had the 
highest scoring alignment against the 7tm_GPCR_Srsx domain (indicated with a star symbol). Posterior probabilities and bootstrap replicates (within 
parenthesis) are shown as a percentage for the major nodes. 
doi:1 0.1 371 /journal.pone.0093048.g003 
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Figure 4. Phylogenetic trees showing closest related Rhodopsin subfamilies to the srw family. Olfactory like genes identified in N. 
vectensis is used as outgroup and subsequently rooted. The Rhodopsin subfamily sequences included in the trees were obtained using srw family 
sequences as queries in BLASTP searches against the Rhodopsin family repertoire of C. elegans and N. vectensis. The top hits included sequences from 
peptide (violet), SOG (olive green), amine (dark green) subfamilies and some unclassified Rhodopsin family sequences (black) that did not have at least 
four of the five best hits from the same subfamily in a BLASTP search against human GPCR repertoire (see methods). The topology of trees shows that 
the peptide (violet) and SOG (olive green) subfamilies are placed basal to the srw clade. Posterior probabilities and the number of bootstrap 
replicates (within parenthesis) are shown as a percentage for the major nodes. The branches that contain srw family members from nematodes (C. 
elegans and P. pacificus) are indicated with a star symbol. The branches that contain sequences that are predicted to be srw in the Pfam search and 
yet separated from the node clustering srw family members from nematodes are indicated with hash (#) symbol. A. californica chemosensory genes 
that had the highest alignment score against the 7tm_GPCR_srw domain in our Pfam search is highlighted in grey box. See supplementary Figure SI 
to view the phylogenetic relationships of the Rhodopsin subfamilies and the srw family in all four analyzed species (C. elegans, N. vectensis, D. 
melanogaster and T. adhaerens). 
doi:1 0.1 371 /journal.pone.0093048.g004 



topology where the srw and srsx family sequences from C. 
elegans fall into two distinct and separate clusters (Figure 3). 
Furthermore, the vertebrate olfactory receptors, which unusu- 
ally showed high similarity to the srsx domain in the Pfam 
search, clustered with the olfactory receptors (100% PP, 99%) 
that had the 7tm_4 (olfactory domain) as their highest scoring 
alignment. In addition, the fragment Nv_205247clustered 
(100% PP, 84%) with the vertebrate like olfactory genes 
identified in TV. vectensis, indicating that it is likely a member of 
the olfactory receptor family, and not srsx (Figure 3). Similarly, 
the fragment Ta_58780 in T. adhaerens, which had two 
significant Pfam HMM profile hits corresponding to 7tm_l 
and srsx domains, clustered separately from the other srsx 
proteins (Figure 3). This suggests that although Ta_58780 
bears resemblance to srsx, it is more likely a member of the 
7tm_l family (Figure 3). 



Identification of the closest related Rhodopsin subfamily 
to the srw family 

In order to find the closest related Rhodopsin subfamily to the srw 
family, we compared each Rhodopsin subfamily from four different 
species with the srw family. We included two species that diverged 
before the protostome-deuterostome split, T, adhaerens and jV. 
vectensis, which contain large and diverse repertoires of Rhodopsin 
family sequences. Also, T. adhaerens and N. vectensis represent the 
most distant lineages where the vertebrate like Rhodopsin subfam- 
ilies have been identified [23,24,25]. Additionally, we included 
D. melanogaster and C. elegans from the protostome superphylum. 
Our analysis strategy included two independent ways to identify 
the most closely related Rhodopsin subfamily to the srw family. First, 
the HMM-HMM profile based comparisons were performed using 
the HHsearch program. We compared the HMM profiles of the 
srw family from different species against a database that contained 
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Figure 5. Multiple alignments of chemosensory GPCR families and Rhodopsin like GPCR subfamilies. Every sequence in the alignment is 
a consensus sequence obtained from the HMM models of the respective family using the HMMEMIT program (see materials and methods). We 
include consensus sequence for 1) vertebrate olfactory receptor family members from human (Hs_7tm_4), 2) srw like sequences identified in the 
genomes of insects, mollusk and 5. mansoni (red stars), 3) Rhodopsin subfamilies from C. elegans, N. vectensis and T. adhaerens (green stars), 4) 
Olfactory like genes identified in N. vectensis (blue star), 5) srw like sequences identified in A. californica (yellow star), 6) 19 nematode chemosensory 
GPCR families from C. elegans (black stars), 7) mammalian taste receptor family (TAS2R), 8) vomeronasal receptors (V1R) and 9) the ancient 
Dictyostelium Cyclic AMP receptors (Dicty_CAR). The consensus sequences that are marked with a red tick symbol are obtained from Pfam HMM 
models downloaded from the Pfam database, while other sequences are obtained from the HMM models constructed using the multiple alignments 
of our sequence datasets. The red rectangular boxes indicate residues that are conserved across the srws and the Rhodopsin subgroups, whereas the 
black rectangular boxes indicate residues that are quite specific to the srw family. 
doi:1 0.1 371 /journal.pone.0093048.g005 



HMM profiles of the Rhodopsin subfamilies from T. adhaerens, N. 
vectensis, D. melanogaster and C. elegans. The database also included 
HMM profiles of families belonging to the GPCR_A Pfam clan. 
Thereafter, phylogenetic analysis was performed on a dataset that 
included representatives of all Rhodopsin subfamilies (peptides, 
amines, SOG, etc) and the entire set of srw family members 
identified in this study. A flowchart describing the strategy was 
illustrated in Figure S3. 

Interestingly, from the HMM-HMM profile based comparisons, 
we found that the peptide receptor subfamily was the most closely 
related to the srw family (see Figure S4). HMM profiles of the srw 
family form 1 2 different species served as separate queries against 
our database of concatenated HMM profiles (see above). For at 



least nine of these 12 srw HMM profiles as queries, the HMM 
profile of peptide receptor subfamily was the top hit with greater 
than 95% probability (see methods for the explanation of 
HHsearch probability score). Moreover, our results showed that, 
among the other Rhodopsin subfamilies, the SOG (somatostatin/ 
opioid/galanin) subfamily was found to be second closest to the 
srw family (Figure S4). Also, amine subfamily was among the top 
five hits, but, with relatively low probability score (around 90%), 
when compared to the peptide and SOG subfamilies. 

In order to verify the HMM-HMM profile comparison results 
(shown above), we performed phylogenetic analysis to compare the 
Rhodopsin subfamilies from four different species (T. adhaerens, N. 
vectensis, C. elegans and D. melanogaster), with the entire set of srw 
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family members. Phylogenetic trees representing each species were 
constructed and in all of them the vertebrate like OR genes 
identified in the cnidarians JV. vectensis was used as outgroup and 
subsequently rooted. The phylogenetic grouping in all four species 
indicated that among all Rhodopsin subfamily members, the peptide 
subfamily was the most closely related to the srw family (Figure 4 
and Figure SI). We observed that in all four phylogenetic trees, the 
peptide subfamily sequences were consistendy placed basal to the 
srw clade, despite the inclusion of representative members from 
different Rhodopsin subfamilies (Figure 4 and Figure SI). The 
confidence measures (PP and bootstrap) were high in the trees 
corresponding to C. elegans (93%, 82%), N. vectensis (98%, 81 %) and 
T. adhaerens (93%) (Figure 4 and Figure SI). Moreover, the 
phylogeny also supports the notion that the SOG subfamily was 
also closely related to the srw family. The SOG subfamily 
members were placed on the same branch as the peptide subfamily 
members in the trees representing C. elegans, JV. vectensis and D. 
melanogaster, whereas in T. adhaerens they clustered basal to the 
peptide subfamily (Figure 4). Overall, this was consistent with the 
HMM-HMM profile based comparisons, which suggested that the 
peptide receptor family is the most closely related Rhodopsin 
subfamily to the srw family, followed by the SOG family of 
receptors. (Figure S4). 

Pairwise HMM-HMM comparisons between the 19 
NemChR families 

In order to detect the relationships between the 19 NemChR 
families, we utilized the Pfam HMM profiles of all 19 families and 
performed HMM-HMM comparisons using the HHsearch 
program. HHM-HMM comparisons demonstrated that all the 
families belonging to the Str superfamily (srd, srh, sri, srj and str 
families) shared a HHsearch probability score of around 95% 
between them (Figure S4). However, the families belonging to the 
Sra superfamily (sra, srab, srb, and sre) and Srg superfamily (srg, 
sru, srv, srx, srt, srxa) did not exhibit greater than 95% probability 
in all pairwise HMM-HMM alignments. For example, in the Sra 
superfamily, alignment between the HMMs of the sra and sre 
families had a low probability score of 28%. Similarly, in the Srg 
superfamily, pairwise alignments between the HMMs of the srxa 
and srx families and the HMMs of the srt and srv families shared 
very low probability scores around 18.4% and 13.2%, respectively. 
In the case of individual families (srbc, srx, srw and srz) none of 
them had reliable hits when compared with the other NemChR 
families. Instead, the srw and srsx families shared greater than 
95% probability with the peptide receptor subfamily belonging to 
the Rhodopsin family of GPCRs. 

Discussion 

Nematode chemosensory GPCRs (NemChRs) in C. elegans that 
are crucial for sensing various environmental cues, are classified 
into 19 gene families based on sequence similarities and 
phylogenetic analysis [14]. These 19 gene families are known to 
be specific to the nematode lineage [14,37] and are initially 
thought to have split from the ancestral Rhodopsin family of GPCRs 
[22]. However, their relationship with the large Rhodopsin family of 
GPCRs is still obscure as the NemChR families does not share a 
clear sequence relationship with the Rhodopsin family of GPCRs, 
except for the srw family [14]. Also, the presence of putative 
homologs of these 19 NemChR families in species other than 
nematodes is not thoroughly examined. In this study, we examined 
26 eukaryotic genomes and provide the first evidence for the 
presence of the nematode chemosensory gene family, 
srw (7tm_GPCR_srw; PF10324) [14], across several phyla of 



protostomes (Figure 1 and Figure 2). Furthermore, based on 
phylogenetic analysis and HMM-HMM comparisons we clarify 
the relationships between the srw family of NemChRs and the 
Rhodopsin subfamilies of GPCRs. 

Here, we investigated 26 eukaryotic species, covering four basal 
eukaryotic branches, and found that the srw family has 29 putative 
orthologs across the insect, mollusc and S. mansoni genomes. These 
29 novel srw members unambiguously clustered with the 
previously known and annotated srw family from nematodes (C. 
elegans and P. pacificus with 94% PP) (Figure 2). Furthermore, these 
novel srw members share several common motifs with the 
previously known and annotated srw sequences from C. elegans 
(see Figure 5 and Figure S2). Based on these findings, we 
demonstrate that the srw family emerged much earlier than the 
other 18 nematode chemosensory families, as we could not find 
other 18 families in species that diverged earlier than the 
nematodes. However, during the course of protostome evolution, 
the srw family has also undergone species specific losses within the 
mollusc lineage, as we could identify srw members in L. gigantea, 
but not in A. californica. 

Through further analysis using HMM-HMM comparisons, we 
sought to delineate the relationships between the srw family and 
the other 1 8 nematode chemosensory families, and as well as with 
the Rhodopsin subfamilies. However, the results show that it is quite 
difficult to identify a reliable similarity between the srw family and 
the other 18 NemChRs, even through sensitive HMM-HMM 
profile based comparisons. This might be due to high divergence 
within and between these families. Therefore, it is still unclear 
whether some of the other NemChR families could have split from 
the srw family. In contrast, we found strong sequence similarity 
between the srw family and the Rhodopsin subfamilies (Figure S4). 
To further explore this, we performed comprehensive phyloge- 
netic analysis (Figure 4) on the srw family and the Rhodopsin 
subfamilies. We observed that the peptide and the SOG 
subfamilies of the large Rhodopsin family are the closest relatives 
to the srw gene family. This is also supported by BLASTP searches 
using the srw sequences as queries against the NCBI nr database, 
which had peptide receptors among the top hits. Based on the 
evidence from the phylogenetic analysis and HMM-HMM 
comparisons, we suggest that the srw family originally split from 
the large Rhodopsin family, possibly from the peptide and/or SOG 
subfamilies, somewhere close to the divergence of the common 
ancestor of protostomes (Figure 6). 

In addition to the srw family, our HMM-HMM profile 
comparisons results suggest that the srsx family shares greater 
than 95% probability (see methods for definition of HHsearch 
probability score) when compared with the HMMs of Rhodopsin 
subfamilies (Figure S4). The HHsearch results demonstrate that 
the srsx family shares around 95.7% probability with the peptide 
receptor subgroup of C. elegans. This suggests that the srsx family 
may have duplicated from the Rhodopsin subfamilies, within the 
nematode lineage. Interestingly, our results also show that the srsx 
family shares significant similarity to some vertebrate olfactory 
receptors, as some of the sequences in vertebrates show similarity 
to both 7tm_GPCR_Srsx domain and 7tm_4 (olfactory) domain, 
within their transmembrane spanning regions (see Table S2). As 
the 7tm_GPCR_Srsx domain and 7tm_4 (olfactory) domain 
belong to the same Pfam clan (GPCR_A clan), we suggest that 
both srsx family and the 7tm_4 family perhaps shares a common 
origin, somewhere before the split of protostomes and deutero- 
stomes. Furthermore, from a previous study [26], as well as from 
our phylogenetic analysis (Figure 3), it seems evident that jV. 
vectensis have chordate like olfactory genes that have expanded 
within the cnidarian lineage, similar to the expansion of this family 
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Figure 6. A schematic presentation of the evolution of GPCR mediated chemosensory gene families. The eukaryotic evolutionary tree is 
constructed with references from the tree of life web project (http://tolweb.org/tree/phylogeny.html). Each gene family is represented with colored 
symbols and their presence and absence were mapped onto eukaryotic branches. A red arc represents the hypothetical origin of the gene families. 
The hypothetical origin of these gene families including, chordate like olfactory receptor family, vomeronasal type 1 family and taste 2 receptors were 
obtained from previous studies [2,16,18,22,24]. Branch lengths are not drawn to represent actual evolutionary distances. 
doi:1 0.1 371 /journal.pone.0093048.g006 



in deuterostomes. In contrast, the chordate-like olfactory receptors 
were lost in all protostomes, which may reflect the expansion of 
NemChRs in this lineage. Taken together, these results suggest 
that the common ancestor of the protostomes and deuterostomes 
may have had ancestral representatives for most of the deutero- 
stome chemosensory gene families and the srw gene family 



(Figure 6). This suggestion is supported by our previous study, 
which argues that most of the chemosensory GPCRs and the 
Rhodopsin family GPCRs share a common origin, somewhere close 
to the divergence of the cnidarians from the eumetazoans [22]. 

In summary, we have performed a detailed mining of the 
nematode chemosensory gene families across 26 eukaryotic 
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genomes and found that the srw gene family has putative orthologs 
across several phyla of protostomes. Furthermore, we here show 
that the srw gene family split from the large Rhodopsin family, 
possibly from the peptide and/or SOG subfamilies, somewhere 
close to divergence of the common ancestor of protostomes. Our 
results provide important insights into the evolutionary events of 
the GPCR genes that are responsible for sensing the environment. 

Supporting Information 

Figure SI Phylogenetic trees showing closest related 
Rhodopsin subfamilies to the srw family in all four 
analyzed species (C. elegans, N. vectensis, D. melano- 
gaster and T. adhaerens) . Olfactory like genes identified in jV. 
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Figure S2 Multiple alignment showing conserved re- 
gions between the novel srw members and the srws from 

C. elegans and P. pacificus. The protein IDs of novel srw like 
sequences identified in the genomes of insects, mollusk and S. 
mansoni are highlighted in red. 
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Figure S3 Flowchart describing the sequence analysis 
strategy used to identify the closest Rhodopsin subfam- 
ily to the srw family. 
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Figure S4 Schematic representation of HMM-HMM 
profile comparison scores between the Rhodopsin and 
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